Chapter 8 Where do I get data❓
8.1 Websites
Here, I list and very briefly describe several excellent sources for free and relevant geo data.
- gbif The Global Biodiversity Information Facility hosts information on the distribution of different species. There is also an R package for directly quering GBIF from R.
- awsome geodata A curated list of geo data providers.
- Corine Land Cover European land use and land cover data set.
- EarthEnv Global raster data sets for many relevant variables, including freshwater variables.
- Ecology & Fish Data Explorer Data on river biota from the UK.
- EEA Datahub Data provided by the European Environmental Agency.
- EU Crop Map High-resolution raster map of European crops.
- Eu Hydro-DEM European river network and catchment data base.
- GADM Global data on administrative areas.
- INSPIRE Infrastructure for spatial information in Europe, lots of official data sets from EU countries.
- JRC Data Catalogue Data sets provided by the European Unions Joint Reserach Center.
- Naiades Data on surface water quality in France.
- USGS Earth Explorer Source for LandSat satellite data. Find a tutorial on how to use it here.
- worldclim Global raster dataset with many important variables.
This list is by no means exhaustive. If you find some source you think would be valuable here, please let me know and I will consider adding it.
8.2 Packages
There is a selection of R packages specifically designed to provide data. Among them are geodata, , tigris, and prism.
8.2.1 geodata
The geodata package is written and maintained by Rob Hijmans who also wrote the raster and terra packages.
The purpose of the package is to provide an easy-to-use interface to download different handy spatial data sets directly from R.
Here, we will go through most of the functions that this package provides, and see we how we can use them.
8.2.1.1 cmip6
With this function, you can download climate projections from the Coupled Model Intercomparison Project 6 (CIMP6, see here for more information).
Shortly, CMIP is an multi-group project that compares the projection of different climate models and also combined them in so called ensemble models to project future climates.
The function has six arguments cmip6_world(model, ssp, time, var, res, path).
model determines which of the CIMP6 models you want to get the results from.
You can select between: “BCC-CSM2-MR”, “CNRMCM6-1”, “CNRM-ESM2-1”, “CanESM5”, “GFDL-ESM4”, “IPSL-CM6A-LR”, “MIROC-ES2L”, “MIROC6” and “MRI-ESM2-0”.
Each model represents the efforts of one working group namely: the Bejing Climate Center (BCC-CSM), the Centre National de Recherches Météorologiques (CNRMCM and CNRM-ESM); the Canadian Centre for Climate Modelling (CanESM); the Geophysical Fluid Dynamics Laboratory (GFDL-ESM4), the Institute Pierre Simon Laplace (IPSL), the Japan Agency for Marine-Earth Science and Technology (MIROC-ES2L and MIROC6) and lastly, the Meterological Research Insitute (MRI-ESM2-0).
ssp is the code for a “Shared Socio-economic Pathway” (see here and here). Possible codes are “126”, “245”, “370” and “585”. The first number here gives the general narrative which start at very sustainable (SSP1) and grow more and more fossil fuel dependend (SSP5).
time refers to the time period that you would like to have the results for? Possible intervals are: “2021-2040”, “2041-2060”, or “2061-2080”.
var determines which variable is returned? The climate models above return: minimum temperature (“tmin”), maximum temperature (“tmax”), average temperature (“tavg”), precipitation (“prec”) and biolcimatic variables (“bioc”, see here for an explanation).
res is the resolution of the raster that is returned. Valid values are 2.5, 5 and 10 (minutes of a degree).
Lastly, path refers to the folder you want to save the data in?
Enter the path to the designated folder here.
To demonstrate the functions capabilities, we will download the global monthly minum temperatures from the Chinese model, for the foreseeable future on a sustainable socioeconomic pathway, in a resolution of 10 minutes of a degree (i.e. 1/6th of a degree).
bcc.10.tmin <- geodata::cmip6_world(model = "BCC-CSM2-MR",
ssp = "126",
time = "2021-2040",
var = "tmin",
res = 10,
path = "geodata/")Lets have a look at the object.
## class : SpatRaster
## dimensions : 1080, 2160, 12 (nrow, ncol, nlyr)
## resolution : 0.1666667, 0.1666667 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : wc2.1_10m_tmin_BCC-CSM2-MR_ssp126_2021-2040.tif
## names : wc2.1~040_1, wc2.1~040_2, wc2.1~040_3, wc2.1~040_4, wc2.1~040_5, wc2.1~040_6, ...
## min values : -46.84375, -50.7, -62.3125, -68.0250, -68.70000, -67.000, ...
## max values : 28.35000, 28.1, 28.8000, 29.2625, 30.46875, 31.925, ...
From a quick look we can see that it is a SpatRaster with 12 layers (months of the year).
It is in longitude and latitude.
The code below uses the tmap package to save the 12 rasters as a animation in gif format, which is displayed underneath.
anim <- tm_shape(bcc.10.tmin) + tm_raster() + tm_facets(nrow = 1, ncol = 1)
tmap_animation(anim, "anim_file.gif")
8.2.1.2 country_codes
The country_codes() function is different from most other functions included in this package.
It does not download spatial data but instead provide a table with different codes and IDs for countries.
Below you can see the first few lines.
## NAME ISO3 ISO2 NAME_ISO
## 1 Afghanistan AFG AF AFGHANISTAN
## 2 Akrotiri and Dhekelia XAD <NA> <NA>
## 3 Åland ALA AX ÅLAND ISLANDS
## 4 Albania ALB AL ALBANIA
## 5 Algeria DZA DZ ALGERIA
## 6 American Samoa ASM AS AMERICAN SAMOA
8.2.1.3 elevation
The are three different elevation functions.
Each loads a digital elevation model (DEM, in this case from the Shuttle Radar Topography Mission, see here).
The three functions are elevation_3s(), elevation_30s() and elevation_global().
elevation_30s() has a country argument which you can supply the name or ISO3 code (see function country_code) and it will download the respective DEM.
As I am writing this, the elevation_3s() function does not seem to work, there is a problem with the server.
The elevation_global() function downloads a DEM for the entire globe in a specified resolution.
dem_andorra <- elevation_30s(country = "Andorra", path = "geodata/")
dem_global <- elevation_global(res = 10, path = "geodata/")Just as example, and because its relatively small, I downloaded the DEM for Andorroa. You can explore it in the interactive map below.
## tmap mode set to interactive viewing
The DEM for the whole world is to large to display interactively, so we will just plot it staticly.

8.2.1.4 gadm
The function gadm() returns a SpatVector (the terra vector class) of administrative boundaries.
It has 3 arguments. country is the country name or ISO3 code (see function country_code()).
level is the the administrative subdivision, the higher the level the more detail the map has.
For higher levels this is quite extensive.
To find Landau as an object in the data set, for example, you only need to go to level 3
Here we only look at level 0 (all of Germany) and level 1 (federal states).
ger0 <- gadm(country = "DEU", level = 0, path = "geodata")
ger1 <- gadm(country = "DEU", level = 1, path = "geodata")Alternatively, we can query the whole world with world()
On this map we see that the polygons were created on a plane.
Geometries that cross the antimeridian (180° longitude) are stretched across the whole globe.
This is also the reason that we need to use st_make_valid() which makes to display the data.
This problem occurs because the sf package assumes spherical geometries.
We can explicitly tell it to not do so and avoid the ugly smears on the map.
## Spherical geometry (s2) switched off
8.2.1.5 sp_occurrence
The sp_occurrence() function can be used to download data from the global biodiversity information facility (GBIF).
We provide the function with genus and species names we want to download observations for.
Here Bubo bubo, the Eagle Owl.
This data set has may columns and lots of observations.
To simplify things, we remove all columns except longitude and latitude (dplyr::select()) and only keep observations that have data in these columns (dplyr::filter()) and transform it into a simple feature geometry object.
occ.bb <-
occ.bb |>
select(lon, lat) |>
filter(!is.na(lon) & !is.na(lat)) |>
st_as_sf(coords = c("lon", "lat"),
crs = 4326)Further, we omit all observations outside of Germany with an intersection.
As a last step to simplify this map, we will lay a grid of hexagons (hexgrid) over Germany and count the number of observations in each cell.
First, we need to create the grid, with the st_make_grid() function.
#- create hexagonal grid over Germany
ger.grid <- st_make_grid(x = occ.bb,
square = FALSE,
cellsize = .5)This grid is still a square of hexagons.
To reduce it to the actual area of Germany we intersect it we the administrative boundaries from ger0.
The result of the intersection a simple feature collection (sfc) and we need to transform it to sf to aggregate the observations.
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a multiple of vector length (arg 1)
We use the sf::st_contains() function to count the number of observations (points in occ.bb) per polygon in ger.grid.
## although coordinates are longitude/latitude, st_contains assumes that they are planar
8.2.2 osmdata
Here, I will introduce you to the osmdata package. osm is short for open street map and you can use the package to access the data from osm directly from R.
osm uses the read-only overpass API to download data from Open Street Map and read them in the formats of sp, sf or silicate. A complete call with osmdata looks like his:
As you can see, four different functions are part of the call.
getbb()opq()
add_osm_feature()osmdata_sf()
Lets go through them one after another to see what they do.
getbb() returns the bounding box of some place in the world.
The only argument you usually have to supply is place_name.
This worked, and returned the bounding box of Landau in longitude and latitude.
Using our new bounding box we can now call opq().
Of course you can also write a custom bbox directly into opq().
This function builds a query to Overpass (overpass query).
## $bbox
## [1] "49.1548639,7.8646346,49.3163444,8.180767"
##
## $prefix
## [1] "[out:xml][timeout:25];\n(\n"
##
## $suffix
## [1] ");\n(._;>;);\nout body;"
##
## $features
## NULL
##
## $osm_types
## [1] "node" "way" "relation"
##
## attr(,"class")
## [1] "list" "overpass_query"
## attr(,"nodes_only")
## [1] FALSE
This is no spatial object yet but just the request (query) that will be send to the API.
This request is further expanded in add_osm_feature().
All items in the Open Street Map data have a key and a value.
A few example are key: amenity - value: bar, cafe or fast_food; key: building - value: school, public or church, key: landuse - value: basin, cementery, grass.
See here for a complete overview of all keys and features.
Let’s download one of each.
ld_amenity = add_osm_feature(ld, key = "amenity", value = "bar")
ld_building = add_osm_feature(ld, key = "building", value = "school")
ld_landuse = add_osm_feature(ld, key = "landuse", value = "grass") Now our queries are complete and we can download the data using osmdata_sf().
The suffix of osmdata_sf() specifies that we want the downloaded data to be in the sf format.
Alternatives are: osmdata_sp,osmdata_sc and osmdata_xml.
The resulting objects are lists which have the different geometries as elements. Below we plot the three data sets together.
## tmap mode set to interactive viewing